73 research outputs found

    Concept drift detection based on anomaly analysis

    Full text link
    © Springer International Publishing Switzerland 2014. In online machine learning, the ability to adapt to new concept quickly is highly desired. In this paper, we propose a novel concept drift detection method, which is called Anomaly Analysis Drift Detection (AADD), to improve the performance of machine learning algorithms under non-stationary environment. The proposed AADD method is based on an anomaly analysis of learner’s accuracy associate with the similarity between learners’ training domain and test data. This method first identifies whether there are conflicts between current concept and new coming data. Then the learner will incrementally learn the non conflict data, which will not decrease the accuracy of the learner on previous trained data, for concept extension. Otherwise, a new learner will be created based on the new data. Experiments illustrate that this AADD method can detect new concept quickly and learn extensional drift incrementally

    Heart failure hospitalization prediction in remote patient management systems

    Get PDF
    Healthcare systems are shifting from patient care in hospitals to monitored care at home. It is expected to improve the quality of care without exploding the costs. Remote patient management (RPM) systems offer a great potential in monitoring patients with chronic diseases, like heart failure or diabetes. Patient modeling in RPM systems opens opportunities in two broad directions: personalizing information services, and alerting medical personnel about the changing conditions of a patient. In this study we focus on heart failure hospitalization (HFH) prediction, which is a particular problem of patient modeling for alerting. We formulate a short term HFH prediction problem and show how to address it with a data mining approach. We emphasize challenges related to the heterogeneity, different types and periodicity of the data available in RPM systems. We present an experimental study on HFH prediction using, which results lay a foundation for further studies and implementation of alerting and personalization services in RPM systems

    Heart failure hospitalization prediction in remote patient management systems

    Full text link
    Healthcare systems are shifting from patient care in hospitals to monitored care at home. It is expected to improve the quality of care without exploding the costs. Remote patient management (RPM) systems offer a great potential in monitoring patients with chronic diseases, like heart failure or diabetes. Patient modeling in RPM systems opens opportunities in two broad directions: personalizing information services, and alerting medical personnel about the changing conditions of a patient. In this study we focus on heart failure hospitalization (HFH) prediction, which is a particular problem of patient modeling for alerting. We formulate a short term HFH prediction problem and show how to address it with a data mining approach. We emphasize challenges related to the heterogeneity, different types and periodicity of the data available in RPM systems. We present an experimental study on HFH prediction using, which results lay a foundation for further studies and implementation of alerting and personalization services in RPM systems

    Concept drift over geological times : predictive modeling baselines for analyzing the mammalian fossil record

    Get PDF
    Fossils are the remains organisms from earlier geological periods preserved in sedimentary rock. The global fossil record documents and characterizes the evidence about organisms that existed at different times and places during the Earth's history. One of the major directions in computational analysis of such data is to reconstruct environmental conditions and track climate changes over millions of years. Distribution of fossil animals in space and time make informative features for such modeling, yet concept drift presents one of the main computational challenges. As species continuously go extinct and new species originate, animal communities today are different from the communities of the past, and the communities at different times in the past are different from each other. The fossil record is continuously increasing as new fossils and localities are being discovered, but it is not possible to observe or measure their environmental contexts directly, because the time is gone. Labeled data linking organisms to climate is available only for the present day, where climatic conditions can be measured. The approach is to train models on the present day and use them to predict climatic conditions over the past. But since species representation is continuously changing, transfer learning approaches are needed to make models applicable and climate estimates to be comparable across geological times. Here we discuss predictive modeling settings for such paleoclimate reconstruction from the fossil record. We compare and experimentally analyze three baseline approaches for predictive paleoclimate reconstruction: (1) averaging over habitats of species, (2) using presence-absence of species as features, and (3) using functional characteristics of species communities as features. Our experiments on the present day African data and a case study on the fossil data from the Turkana Basin over the last 7 million of years suggest that presence-absence approaches are the most accurate over short time horizons, while species community approaches, also known as ecometrics, are the most informative over longer time horizons when, due to ongoing evolution, taxonomic relations between the present day and fossil species become more and more uncertain.Peer reviewe

    MobilityMirror: Bias-Adjusted Transportation Datasets

    Full text link
    We describe customized synthetic datasets for publishing mobility data. Private companies are providing new transportation modalities, and their data is of high value for integrative transportation research, policy enforcement, and public accountability. However, these companies are disincentivized from sharing data not only to protect the privacy of individuals (drivers and/or passengers), but also to protect their own competitive advantage. Moreover, demographic biases arising from how the services are delivered may be amplified if released data is used in other contexts. We describe a model and algorithm for releasing origin-destination histograms that removes selected biases in the data using causality-based methods. We compute the origin-destination histogram of the original dataset then adjust the counts to remove undesirable causal relationships that can lead to discrimination or violate contractual obligations with data owners. We evaluate the utility of the algorithm on real data from a dockless bike share program in Seattle and taxi data in New York, and show that these adjusted transportation datasets can retain utility while removing bias in the underlying data.Comment: Presented at BIDU 2018 workshop and published in Springer Communications in Computer and Information Science vol 92

    Can we identify non-stationary dynamics of trial-to-trial variability?"

    Get PDF
    Identifying sources of the apparent variability in non-stationary scenarios is a fundamental problem in many biological data analysis settings. For instance, neurophysiological responses to the same task often vary from each repetition of the same experiment (trial) to the next. The origin and functional role of this observed variability is one of the fundamental questions in neuroscience. The nature of such trial-to-trial dynamics however remains largely elusive to current data analysis approaches. A range of strategies have been proposed in modalities such as electro-encephalography but gaining a fundamental insight into latent sources of trial-to-trial variability in neural recordings is still a major challenge. In this paper, we present a proof-of-concept study to the analysis of trial-to-trial variability dynamics founded on non-autonomous dynamical systems. At this initial stage, we evaluate the capacity of a simple statistic based on the behaviour of trajectories in classification settings, the trajectory coherence, in order to identify trial-to-trial dynamics. First, we derive the conditions leading to observable changes in datasets generated by a compact dynamical system (the Duffing equation). This canonical system plays the role of a ubiquitous model of non-stationary supervised classification problems. Second, we estimate the coherence of class-trajectories in empirically reconstructed space of system states. We show how this analysis can discern variations attributable to non-autonomous deterministic processes from stochastic fluctuations. The analyses are benchmarked using simulated and two different real datasets which have been shown to exhibit attractor dynamics. As an illustrative example, we focused on the analysis of the rat's frontal cortex ensemble dynamics during a decision-making task. Results suggest that, in line with recent hypotheses, rather than internal noise, it is the deterministic trend which most likely underlies the observed trial-to-trial variability. Thus, the empirical tool developed within this study potentially allows us to infer the source of variability in in-vivo neural recordings

    Automated Adaptation Strategies for Stream Learning

    Get PDF
    Automation of machine learning model development is increasingly becoming an established research area. While automated model selection and automated data pre-processing have been studied in depth, there is, however, a gap concerning automated model adaptation strategies when multiple strategies are available. Manually developing an adaptation strategy can be time consuming and costly. In this paper we address this issue by proposing the use of flexible adaptive mechanism deployment for automated development of adaptation strategies. Experimental results after using the proposed strategies with five adaptive algorithms on 36 datasets confirm their viability. These strategies achieve better or comparable performance to the custom adaptation strategies and the repeated deployment of any single adaptive mechanism
    • …
    corecore